Conditional Dependencies: A Principled Approach to Improving Data Quality

نویسندگان

  • Wenfei Fan
  • Floris Geerts
  • Xibei Jia
چکیده

Real-life date is often dirty and costs billions of pounds to businesses worldwide each year. This paper presents a promising approach to improving data quality. It effectively detects and fixes inconsistencies in real-life data based on conditional dependencies, an extension of database dependencies by enforcing bindings of semantically related data values. It accurately identifies records from unreliable data sources by leveraging relative candidate keys, an extension of keys for relations by supporting similarity and matching operators across relations. In contrast to traditional dependencies that were developed for improving the quality of schema, the revised constraints are proposed to improve the quality of data. These constraints yield practical techniques for data repairing and record matching in a uniform framework.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovering Conditional Functional Dependencies to Detect Data Inconsistencies

Poor quality data is a growing and costly problem that affects many enterprises across all aspects of their business ranging from operational efficiency to revenue protection. In this paper, we present an approach that efficiently and robustly discovers conditional functional dependencies for detecting inconsistencies in data and hence improves data quality. We evaluate our approach empirically...

متن کامل

Mining Constant Conditional Functional Dependencies for Improving Data Quality

This paper applies the data mining techniques in the area of data cleaning as effective in discovering Constant Conditional Functional Dependencies(CCFDs) from relational databases . These CCFDs are used as business rules for context dependent data validations. Conditional Functional Dependencies(CFDs) are an extension of Functional dependencies(FDs) which captures the consistency of data by su...

متن کامل

Data-driven extensions to HMM statistical dependencies

In this paper, a new technique is introduced that relaxes the HMM conditional independence assumption in a principled way. Without increasing the number of states, the modeling power of an HMM is increased by including only those additional probabilistic dependencies (to the surrounding observation context) that are believed to be both relevant and discriminative. Conditional mutual information...

متن کامل

Self-Organizing Maps in data analysis - notes on overfitting and overinterpretation

The Self-Organizing Map, SOM, is a widely used tool in exploratory data analysis. Visual inspection of the SOM can be used to list potential dependencies between variables, that are then validated with more principled statistical methods. In this paper we discuss the use of the SOM in searc hing for dependencies in the data. We poin t out that simple use of the SOM may lead to excessive number ...

متن کامل

Self-Organizing Map in Data-Analysis - Notes on Overfitting and Overinterpretation

The Self-Organizing Map, SOM, is a widely used tool in exploratory data analysis. Visual inspection of the SOM can be used to list potential dependencies between variables, that are then validated with more principled statistical methods. In this paper we discuss the use of the SOM in searching for dependencies in the data. We point out that simple use of the SOM may lead to excessive number of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009